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OPTIMAL SMOOTHING IN NONPARAMETRIC MIXED-EFFECT 

MODELS 1 

By Chong Gu and Ping Ma 

Purdue University and Harvard University 

Mixed-effect modefs are wideiy used for the analysis of corre- 
lated data such as longitudinal data and repeated measures. In this 
article, we study an approach to the nonparametric estimation of 
mixed-effect models. We consider models with parametric random ef- 
fects and flexible fixed effects, and employ the penalized least squares 
method to estimate the models. The issue to be addressed is the 
selection of smoothing parameters through the generalized cross- 
validation method, which is shown to yield optimal smoothing for 
both real and latent random effects. Simulation studies are con- 
ducted to investigate the empirical performance of generalized cross- 
validation in the context. Real-data examples are presented to demon- 
strate the applications of the methodology. 

1. Introduction. Mixed-effect models are widely used for the analysis of 
data with correlated errors. The linear mixed-effect models, also known as 
variance component models, are of the form 

(1.1) Yi = x Jf3 + zJh + £i, 

i = 1, . . . , n, where are the fixed effects, zfh are the random effects with 
b ~ N(0,B), and ~ N(0,a 2 ) are independent of b and of each other; see, 
for example, [5] and [12]. The unknown parameters are j3, B and a 2 , which 
are to be estimated from the data. Nonlinear and nonparametric generaliza- 
tions of (1.1) can be found in, for example, [8, 11, 17]. 
In this article, we consider models of the form 

(1.2) Y i = V (x i ) + zJh + £ i , 



Received May 2002; revised August 2004. 
Supported by NIH Grant R33HL68515. 

AMS 2000 subject classifications. Primary 62G08; secondary 62G05, 62G20, 62H12, 
41A15. 

Key words and phrases. Correlated error, generalized cross-validation, longitudinal 
data, mixed-effect model, penalized least squares, repeated measures, smoothing spline. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2005, Vol. 33, No. 3, 1357-1379. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



C. GU AND P. MA 



where the regression function r](x) is assumed to be a smooth function on a 
generic domain X. The model terms r\{x) or rj(x) + z T h will be estimated 
using the penalized (unweighted) least squares method through the mini- 
mization of 

1 n 1 
(1.3) - J2 {Yi - r,{ Xi ) - zjhf + -b T Sb + XJ( V ), 

i=i 

where the quadratic functional J(rj) quantifies the roughness of 77 and the 
smoothing parameter A controls the trade-off between the goodness-of-fit 
and the smoothness of 77; note that if one substitutes a 2 B _1 for S in (1.3), 
then the first two terms are proportional to the minus log likelihood of (Y, b) . 
We will treat £ as a tuning parameter like A, however, and not be concerned 
with the estimation of a 2 B~ l . Technically, (1.3) resembles a partial spline 
model, but with the partial terms z T h penalized. 

Absent the random effects z T b, penalized least squares regression has 
been studied extensively in the literature; see, for example, [16] and [2] for 
comprehensive treatments of the subject. The models of (1-2) were first con- 
sidered by Wang [17], who used penalized marginal likelihood (of Y) to 
estimate r\. Smoothing parameter selection in penalized marginal likelihood 
estimation with correlated data was studied by Wang [18], who illustrated 
the middling performance of various versions of cross-validation, in contrast 
to the more reliable performance of the generalized maximum likelihood 
method of Wahba [15] derived under the Bayes model of smoothing splines. 
Under the Bayes model, 77 itself is decomposed into fixed and random ef- 
fects, with XJ(r]) acting as the minus log likelihood of the random effects; 
the generalized maximum likelihood method of Wahba [15] is essentially the 
popular restricted maximum likelihood method widely used for the estima- 
tion of variance component models. 

The purpose of this article is to study the estimation of the model terms 
in (1.2) through the minimization of (1.3), with the smoothing parameter A 
and the correlation parameters £ selected by the standard generalized cross- 
validation method of Craven and Wahba [1] , which was developed for inde- 
pendent data. In some applications, the random effects z T h are physically 
interpretable, or real, and in some others, z T h are merely a convenient device 
for the modeling of variance components, or latent; for the latter case, the 
estimation through (1.3) turns the variance components into "mean compo- 
nents." For both real and latent random effects, generalized cross-validation 
will be shown to yield optimal smoothing, through asymptotic analysis and 
numerical simulation. Real-data examples are also presented to illustrate the 
applications of the methodology. 

The rest of this article is organized as follows. In Section 2 the problem 
is formulated and preliminary analysis is conducted. Examples are given in 



SMOOTHING IN MIXED-EFFECT MODELS 



3 



Section 3. Generalized cross-validation and its optimality are discussed in 
Section 4, followed by simulation studies in Section 5. Real-data examples 
are shown in Section 6. Proofs of the theorems and lemmas in Section 4 are 
collected in Section 7. A few remarks in Section 8 conclude the article. 



2. Penalized least squares estimation. Consider the minimization of (1.3) 
for rj in a g-dimensional space span{£i, . . . , Functions in the space can 
be expressed as 



'x)c. 



(2.1) r](x) = J2cMx)=Z 
Plugging (2.1) into (1.3), one minimizes 

(2.2) (Y-Rc- Zb) T (Y -Rc- Zb) + b T Sb + n\c T Qc 

with respect to c and b, where S > is p x p, R is n x q with the (i, j)th 
entry £j(Xi), Z = (z±, . . . ,z n ) T is n x p and Q is q x q with the (j, k)th 
entry J(£j,£fc). Differentiating (2.2) with respect to c and b and setting the 
derivatives to 0, one has 



(2.3) 



R T R + nXQ R T Z \ { c\ _ { R T Y 
Z T R Z T Z + S/ lb/"i Z T Y 



Assume that the linear system is solvable, that is, the columns of (^t) are 
in the column space of the left-hand side matrix. A solution of (2.3) is then 
given by 

c\_(R T R + n\Q R T Z \+ f R T Y 
h ) ~ \ Z T R Z T Z + S ) \ Z T Y 

where C + denotes the Moore-Penrose inverse of C satisfying CC + C = C, 
C+CC+ = C+, {CC+) T = CC+ and {C + C) T = C+C. 

Write D = Z T Z + E and E = (R T R + nXQ) - R T Z D^ 1 Z T R. With (2.3) 
solvable, one has 

R T R + nXQ R T Z\(K\_(R T 
Z T R D J \ L J ~ \ Z T 

for some K and L, which, after some algebra, yields EK(I - ZD~ x Z T y x = 
R T , so the columns of R T are in the column space of E. It follows that 
EE+R T = R T , and in turn 

R T R + nXQ R T Z 
Z T R Z T Z + S 

E+ -E+RTZD- 1 
-D~ 1 Z T RE + D" 1 + D~ 1 Z T RE + R T ZD~ l 
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It then follows that 

(2.4) f] = Rc = RE + R T (I - ZD~ 1 Z T )Y = MY. 

Similarly, one has 
Y = Rc + Zb 

= {(I - ZD- 1 Z T )RE + R T (I - ZD- 1 Z T ) + ZD^Z^Y = A(X, E)Y, 
where 

(2 5) 

= (I- ZD- 1 Z T )RE + R T (I - ZD^ 1 Z T ) + ZD~ 1 Z T 

is known as the smoothing matrix. Alternatively, for E = R T R + nXQ and 
D = D — Z T RE + R T Z , one may write 

R T R + nXQ R T Z 
Z T R Z T Z + E 

E+ + E+R T ZD~ 1 Z T RE + —E + R T ZD~ 1 

-b- 1 z T RE+ b- 1 

yielding the expressions 

(2.6) M = 1(A) - A{\)Z{Z T {I - A{\))Z + ^y 1 Z T (I - 1(A)), 

where ^4(A) = RE + R T is the smoothing matrix when the random effects are 
absent, and 

(2.7) A(X, E) = 1(A) + (I - i(A))Z(Z T (/ - A{\))Z + H)~ 1 Z T (I - 1(A)). 

The eigenvalues of ^4(A,E) and A(X) are in the range [0, 1]. 

With the standard formulation of penalized least squares regression, the 
minimization of (1.3) is performed in a so-called reproducing kernel Hilbert 
space 7i C {rj: J(rj) < oo} in which J(rj) is a square seminorm, and the so- 
lution resides in the space Mj © sp&n{Rj(xi, •), % = 1, . . . , n}, where Aj = 
{77 : J{rj) = 0} is the null space of J(r]) and Rj(-, •) is the so-called reproduc- 
ing kernel in H © Aj. The solution has an expression 

m n 

(2.8) 7](x) = ^2d u cf) u (x) + ^2ciRj(xi,x), 

i=l i=l 

where {4> u }T=i * s a basis of Aj. It follows that i? = (S,Q), where 5 is n x 
m with the (i,v)th entry (\) v {xi) and Q is n x n with the (i,j)th entry 
Rj(xi,Xj). From the property of reproducing kernels, it can also be shown 
that J(Rj(xi, -),Rj(xj, •)) = Rj(xi,Xj), so Q = diag(0, Q). See, for example, 
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[2] and [16]. The linear system (2.3) is thus solvable as long as Z is of full 
column rank. 

For fast computation, Kim and Gu [9] consider the space J\fj ©span{i? j(zj, •), 
j = l,...,q}, where {zj} are a random subset of {x{\. In that setting, 
R = (S,R), where R is n x q with the (i,j)th entry Rj(zj,Xi), and Q = 
diag(0,Q), where Q is q x q with the (j, k)th entry Rj(zj,z^). Since J(rj) 
is a square norm in span{Rj(zj, •),_?' = 1, ... , q}, it can be shown that the 
columns of R T are in the column space of Q. It then follows that the linear 
system (2.3) is solvable when Z is of full column rank. 

The formulation of (2.1) and (2.2) also covers general penalized regression 
splines, so long as (2.3) is solvable. A sufficient condition is for both R and 
Z to be of full column rank. 

3. Examples. A few examples are in order to illustrate the formulation 
of the problem and the potential applications of the method under study. 
The examples will be employed in the simulation study of Section 5 and the 
data analysis of Section 6. 

Example 3.1 (Growth curves). Consider the "growth" over time of a 
certain quantity associated with p subjects, 

Yi =7}(xi) +b Si +Si, 

where Yi is the ith observation taken at time Xi S [0, a] from subject Sj G 
{1, . . . ,p}, and b s ~ iV(0, cr^) are the subject random effects, independent of 
the measurement error £j and of each other. In this setting, B = cr^I, so the 
p x p matrix £ is diagonal with only one tunable parameter. The random 
effects b s are real. 

Taking J(rj) = j^(d 2 ri/dx 2 ) 2 dx, one has the cubic smoothing spline, with 
the 4>v and Rj functions in (2.8) given by 

4>i(x) = l, 4> 2 (x)=x, Rj(x 1 ,x 2 )= [ (x 1 -u) + (x 2 -u) + du, 

Jo 

where (•)+ =max(-,0). See, for example, [2], Section 2.3.1. The null space 
model has the expression rj(x) = @o + f3\x. 

Taking J(rf) = J a (L e r]) 2 h g dx , where L e = (d/dx)(d/dx + 9) and hg = e 39x 
for some 6 > 0, one has a (negative) exponential spline. The null space model 
has the expression r)(x) = (3q + \3\e~ Sx . Transforming x by x = (1 — e~ 9x )/9, 
it can be shown that 

[ a (L e rj) 2 h e dx= [ a (d 2 r]/dx 2 ) 2 dx, 
Jo Jo 

where a= (1 - e~ ea )/6, so one has a cubic spline in x. See, for example, [2], 
Example 4.7, Section 4.3.4. Note that the exponential spline reduces to the 
cubic spline in x when 6 = 0. 
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Suppose Y is the logarithm of the measurement Y satisfying a log-normal 
distribution with \x = f](x) + b s and a 2 a constant; the mean of Y is known to 
be exp(/j + <r 2 /2). The null space model of the cubic spline characterizes an 
exponential growth curve for Y, and the null space model of the exponential 
spline corresponds to a Gompertz growth curve for Y. The splines allow 
departures from these parametric growth curves. 

Example 3.2 (Growth under treatment). Consider the setting of Exam- 
ple 3.1, but with the p subjects divided into t treatment groups. The fixed 
effect becomes n(x,r), where r G {1, . . . ,t} denotes the treatment level. For 
the identifiability of t](x,t) and b s , one needs more than one subject per 
treatment level. One may decompose 

t](x,t) = 770 + 771(2:) + ??2(t) + 771,2(2:, r), 

where 770 is a constant, 771(2:) is a function of x satisfying 771 (0) = 0, 772 (r) is a 
function of r satisfying Y%=i V2(t) = 0, and 771,2(2;, r) satisfies 771,2(0,7") = 0, 
Vr, and J2t=i Vipfei T ) = 0, Vx. The term 770 + 771(2;) is the "average growth" 
and the term 772(7") + 771,2(2;, r) is the "contrast growth." 
For flexible models one may use 

j( ?? ) = 0- 1 Hd^n/dx^dx + e^l [ a y{d 2 r, l2 /dx 2 ) 2 dx, 

Jo ' Jo ^ 

which has a null space Mj of dimension It. A set of <p v is given by 

{l,x,I [r=j] - 1/t, (I[ T=j ] - l/t)x,j = l,...,t-l}, 
and the function Rj is given by 

Rj{x 1 ,t 1 ;x 2 ,t 2 ) = 9i I {x 1 -u) + {x 2 -u) + du 
Jo 

+ #i,2(-f[ T1 =T 2 ] - V*) J ( x i - u)+{x 2 - u)+ du. 

See, for example, [2], Section 2.4.4, Problem 2.14(c). To force an additive 
model 77(2,7") = 7/0 + 771(2;) + 772 (r), which yields parallel growth curves at 
different treatment levels, one may set #1,2 = and remove (I[ T =j] ~ l/t)x 
from the list of 4> u . One may also choose to transform x through x = (1 — 
e~ 9x )/9 and fit models on the x scale. 

Example 3.3 (Clustered observations). Consider observations from p 
clusters, such as in multicenter studies, Y$ = 77(2^) +£j, where Yi is taken 
from cluster Cj with covariate X{. Observations from different clusters are 
independent, while observations from the same cluster may be correlated 
to various degrees. The intracluster correlation may be modeled via £j = 
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b Ct +£i, where b~7V(0,5), with B = diag(o-f , . . . , o 2 ), and e ~ N(0, o 2 I), 
independent of each other; the intracluster correlation in cluster a is given 
by oil {a + of). In this setting, the p x p matrix E involves p tunable 
parameters on the diagonal. The random effects b c are latent. 

Note that the covariate x is generic, which can be univariate as in Exam- 
ple 3.1, or multivariate as in Example 3.2. 

4. Optimality of generalized cross-validation. For the selection of the 
smoothing parameter A (and others such as the 9 in Example 3.1 and the 
0\ and #1,2 in Example 3.2, if present) and the correlation parameters £, we 
propose to minimize the generalized cross-validation score 

n^Y T (/-.4(A,E)) 2 Y 



(4.1) V(X,T,) 



{u-Ht(I- A(A,E))} 2 ' 



£ may involve less than p(p + l)/2 tunable parameters. It will be shown 
in this section that the minimizers of V(A, E) yield optimal smoothing 
asymptotically, in the sense to be specified. Numerical verifications of the 
asymptotic analysis will be presented in the next section. Generalized cross- 
validation was proposed by Craven and Wahba [1] for independent data, 
with the asymptotic optimality established by Li [10] in that setting; see 
also [13]. 

First consider the mean square error at the data points, 

1 n 

(4.2) L X (A, E) = - (Xi ~ Vixi) ~ zfb) 2 , 

which is a natural loss when the random effects z T b are real. Simple algebra 
yields 

Li (A, E) = - (AY -rj- Zh) T (AY - rj - Zb) 

n 

= -(/? + Zb) T (I - A) 2 (r] + Zb) 

n 

- -(rj + Zbf(I - A)Ae + -e T A 2 e, 
n n 

where 77 = (rj(xi), . . . ,rj(x n )) T , Y = r) + Zb + e and the arguments (A, E) are 
dropped from the notation of the smoothing matrix A. Taking expectation 
with respect to b and e, the risk is seen to be 

J?i(A,S)=^[L 1 (A,E)] 

( 4 - 3 ) 

= LrjTfj _ A f +I tr ((/ _ A ) 2 ZBZ T ) + —tiA 2 . 
n n n 
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Now define 

(4.4) U(X,J:) = -Y T (I-A) 2 Y + -a 2 tvA. 

n n 

It follows that 

E/(A,E)-Li(A,E) - -e T e 
n 

(4.5) 

= -(tj + Zb) T (I - A)e - -{e T Ae - aHiA). 

n n 

We shall establish the optimality of U(X, E) under the following condi- 
tions. 

Condition C.l. The eigenvalues of E(Z T (I - A(\))Z + E)"^ are 
bounded from above. 

Condition C.l holds for E with eigenvalues bounded from above, and for 
E of magnitude up to the order of 0(y/n) when the magnitude of Z T (I — 
A(X))Z grows at the rate of 0(n). 

Condition C. 2. As n — > oo, rci?i(A, E) — > oo. 

The condition simply concedes that the parametric rate of 0(n _1 ) is not 
achievable. In the absence of random effects, for 77 satisfying J(rj) < oo or 
more stringent smoothness conditions, it typically holds that n -1 77 T (I — 
i(A)) 2 /7 = 0(X S ) for some s £ [1,2], and tri 2 (A) x A" 1 / 7 * as A -> and 
nX l r — > oo for some r > 1, at least for univariate smoothing splines; see, for 
example, [1, 15] and [2], Section 4.2.3. For the cubic splines of Example 3.1, 
r = 4. 

Lemma 4.1. Under Condition C.l, if n~ 1 vj T (I — A(X)) 2 r] = 0(X S ) and 
tri 2 (A) = 0{X~ 1 ' r ) as A ^ and nX 1 ^ -> oo, i/ien i?i(A,E) = 0(A S + 
n~ 1 A~ 1 /' r +n~ 1 p). 

See Section 7 for the proof of the lemma. For fixed p, the random effects 
add little to the equation, and Condition C.2 is satisfied for A — > 0, nX 1 ^' — > 
oo and E of magnitude up to order 0(y/n); the optimal A x 
well within the domain. In fact, the restriction on E is not really necessary 
for Condition C.2 but to assure that R± — > 0. When p grows with n, Condi- 
tion C.2 clearly holds, though one may need to scale back the domain of E 
for i?i = o(l) to remain true. 

Theorem 4.1. Under Conditions C.l and C.2, as n— >oo, one has 

U(X, E) - Li(X, E) - -e T e = o p (Li(A, S)). 
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The proof of the theorem is given in Section 7. When the conditions of 
the theorem hold in a neighborhood of the optimal (A, E), the minimizer of 
U(X, E) will deliver nearly the minimum loss. 

The use of U(X, E) requires knowledge of a 2 , which usually is not available 
in practice. With an extra condition, the result also holds for V(X, E). 

Condition C.3. Asn^oo, {n~ l tr A(X, E)} 2 /-^ 1 tr A 2 (X, E)} -> 0. 

In the absence of random effects, Condition C.3 generally holds in most 
settings of interest. In fact, it typically holds that trA(X) X X~ l l r as A — > 
and nX l / r — > oo, of the same order as trA 2 (A). See, for example, [1, 10, 
15] and [2], Section 4.2.3. 

Lemma 4.2. // tri(A) = 0(A _1 / r ) and tri 2 (A) x A" 1 /*" as A -» and 
nX 1 ^' — > oo, then Condition C.3 aoZds /or p ap to order 0(y/n). 

The proof is to be found in Section 7. 

Theorem 4.2. Under Conditions C.l, C.2 and C.3, as n — > oo, one has 
V(X, S) - Li (A, E) - ie T e = o^U (A, S)). 

Proof. Given Theorem 4.1, the proof follows that of Theorem 3.3 in [2], 
page 66. □ 

We now turn to the case with latent random effects z T b, for which the 
loss Li(A,E) of (4.2) may not make much practical sense. Write Pz = 
Z(Z T Z) + Z T and P% = I — Pz- We consider the estimation of P% rj by 
PzVi where f) is given in (2.4); the projection ensures the identifiability 
of the target function. Accounting for the error covariance a 2 I + ZBZ T , 
one may assess the estimation precision via the loss 

Z 2 (A, E) = — (t) - vfPz(v 2 I + ZBZ T Y l P^(f 1 - rj). 

n 

Since (a 2 I + ZBZ T )- 1 = a' 2 (I - ZD Q l Z T ), where D = Z T Z + a 2 B~\ one 
may use 

(4.6) L 2 (A, E) = a 2 L 2 (X, E) = -(r) - v) T Pz(V ~ V), 

n 

which is independent of B. Write Q z = ZD~ l Z T and recall M = RE + R T (I- 
Qz) from (2.4). Plugging fj = M(r] + Zh + e) into (4.6) and taking expec- 
tation, one has the risk 

i? 2 (A,S) = £[L 2 (A,S)] 
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(4.7) = -{rf{I - M) T P^{I - M)rj 

n 

+ tv(M T P^MZBZ T ) + a 2 tv(M T P^M)}. 
From (2.5) and (2.4), one has 

(/ - A)Y = {I- Q Z )(I - RE + R T (I - Q Z ))Y 
= (P^ + P z -Qz)(v-V + Zh + e) 
= Pz-(v-v) + (Pz-Qz)(v-V + Zb + e) + Pz-e 
= Pz(v-f)) + (Pz - Qz)(Y -f,)+ Pzs. 

It follows that 

Y T (I - A) 2 Y = {rj- fjfPiiv ~V) + e T P^e + 2{r, - f,) T P£e 
+ (Y-f ] f(P z -Qz) 2 (Y-fj), 

and hence 

U(X,Z)- L 2 {\Y,)--e T e 
n 

(4.8) = l{ Y -f,f(P z -Q z f(Y-f,) 

n 

+ -(V~ v) T Pz £ ~ -£ Tp z £ + -v 2 tr A - 

n n n 

With an extra condition, U{X, S) and V(A, S) can be shown to track .^(A, S) 
asymptotically. 

Condition C.4. As n -> oo, Pi (A, S) - R 2 (X, £) = o(Pi(A, £)). 

Conditions C.2 and C.4 together imply that Pi (A, S) - P 2 (A, £) = o(R 2 (X, £)) 
and nR 2 {X, S) — ► oo. Subtracting (4.7) from (4.3), some algebra yields 

Pi(A,£)-P 2 (A,£) 

= % T (/ - M) T (P Z - Qz) 2 (/ - M)r7 

(4.9) U 

+ - tr (((P z - Q z ) + {P z - Qz)RE + R T (P z - Qz)) 2 ZBZ T ) 
n 

2 

+ — tr {{Qz + (Pz - Qz)M) T {Q z + (Pz - Qz)M)). 
n 

The following lemma confirms the feasibility of Condition C.4 for fixed p. 
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Lemma 4.3. Forfixedp, if (i) r) T (I- A(X, Y))P Z (I- A(A, S))r? = o(r) T (I- 
A(A,£)) 2 ri), (ii) E < p n Z T Z for p 2 n = o{R 1 ), and (iii) tr(Z T Z)/n is bounded, 
then Pi(A,£) -P 2 (A,£) = o(Pi(A,£)). 

The proof of the lemma is given in Section 7. Condition (i) bars (/ — j4)t? 
from being overloaded in the column space of Z; (ii) holds for £ of magnitude 
up to the order 0(y/n) when Z T Z grows at a rate 0(n), which is typical for 
fixed p. Alternatively, if p n = o(Ri) in (ii), which usually holds for bounded 
£, then (i) can be replaced by bounded rj T r]/n; see the proof in Section 7. 

Theorem 4.3. Under Conditions C.l, C.2 and C.4, as n— > oo, one has 

U(X, £) - L 2 (A, E) - -e T e = oJL 2 (X, £)). 
n 

If, in addition. Condition C.3 also holds, then 

V(X, S) - L 2 (A, E) - -e T e = o p (L 2 (X, £)). 
n 

The proof of the theorem is given in Section 7. 

Up to this point, we have considered purely real and purely latent random 
effects. In practice, one could have a mixture of real and latent random effects 
in the same setting. Partition Z = (Zi,Z 2 ) and h T = (b^b^) and assume 
bi and b 2 are independent so B is block diagonal. Define 

(4.10) L 3 (A, E) = -(f) + Zib! - 77 - ZibifP^r) + Zibi - V - Zibi) 

and i? 3 (A,£) =-E[L 3 (A,E)], where = J - ^(ZjZ^+Zj; L 3 (A,E) is a 
natural loss for Zibi real and Z 2 b 2 latent. Replace i? 2 (A, E) in Condition C.4 
by P 3 (A,S). 

Condition C.5. As n -> oo, Pi (A, S) - P 3 (A, E) = o(Pi(A, E)). 

A general result follows, of which the earlier theorems are special cases 
with nil Z\ or nil Z 2 - 

Theorem 4.4. Under Conditions C.l, C.2 and C.5, as n — > oo, one has 

U(X, E) - L 3 (A, S) - -s T e = o p (L 3 (X, £)). 
n 

If, in addition, Condition C.3 also holds, then 

V(X, S) - L 3 (A, E) - -e T s = o p (L 3 (A, £)). 
n 

The proof of the theorem follows from straightforward modifications of 
the proof of Theorem 4.3 as given in Section 7. 
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5. Empirical performance. We now present simple simulations to illus- 
trate the practical performance of generalized cross-validation in the context. 

5.1. Real random effects. First consider a setting with real random ef- 
fects covered by Theorems 4.1 and 4.2. One hundred replicates of samples 
were generated according to 

(5.1) Yi = r)(xi) + b Si +ei, » = 1,...,100, 

where rj(x) = 3sin(27rx), xi a random sample from [7(0, 1), £j ~ N(0, 0.5 2 ), 
b s ~ iV(0, 0. 5 2 ) and Sj G {1,...,10}, ten each. Cubic smoothing splines as 
described in Example 3.1 were calculated with (A U ,E U ) minimizing J7(A,E) 
of (4.4), (A^jEu) minimizing V(A, E) of (4.1) and (A m ,E m ) minimizing 
Lx(A,E) of (4.2). 

The loss Li(A,E) was recorded for the fits. For the V fit with (At,,E„), 
the variance estimate through 

2 _ Y T (I-A(X V ,Z V )) 2 Y 
{ '> tv(I - A(X V ,E V )) 

was also recorded; the variance estimate was proposed by Wahba [14] for 
independent data. The ratio <7 2 /cr 2 as part of E was "estimated" through 
Eu, S,; or E m . 

It is known that cross-validation may lead to severe undersmoothing on up 
to about 10% replicates. To circumvent the problem, a simple modification 
proved to be very effective in the empirical studies of Kim and Gu [9] . The 
modified V is given by 

^ t/ / "\ ^_ n-'YTjl -A(A,£)) 2 Y 

(5.3) y a (A,s) - {n _ ltr(/ _ aA(A)S))}2 

for some a > 1. Similarly, U can be modified by 

(5.4) UJX, S) = -Y T (I - A(\, E)) 2 Y + -a 2 ati A(X, E). 

n n 

A good choice of a is around 1.4. The U and V fits with a = 1.2, 1.4, 1.6, 1.8 
were also calculated and the loss and variance estimates recorded. 

The performances of U a (X, S) and V^,(A,E) are illustrated in Figure 1. In 
the left and center frames, the losses Li(A U) E u ) and Li(A„,E„) are plotted 
versus the minimum possible, for a = 1, 1.4. The relative efficacy of U a (X, E) 
and V a (X, E) for a = 1, 1.2, 1.4, 1.6, 1.8 is summarized in the right frame in 
box plots. Roughly speaking, U a and V a with a = 1 are "unbiased" by The- 
orems 4.1 and 4.2, and setting a > 1 introduces "bias." The top-tier perfor- 
mance may degrade slightly as a increases, but the worst cases are being 
pulled in for a up to 1.2 ~ 1.4, where one appears to have the "minimax" 
performance. 



SMOOTHING IN MIXED-EFFECT MODELS 



13 



Further details of the simulation are shown in Figure 2. In the left frame, 
A M and X v for a = 1 and a = 1.4 are plotted against each other, where a 
very small A by a = 1 is seen to be pulled to the "normal" range by a = 1.4. 
The number of cases with severe undersmoothing by cross-validation seems 
to be much less than what is typically seen in simulations with indepen- 
dent error; the phenomenon has yet to be understood. The center frame of 
Figure 2 plots the variance ratio cx^/cr^ "estimated." through. X] m , S u and 
E„. An interesting observation is the wide range of E m , especially the many 
very small values, which effectively leave the term z T h unpenalized like the 
fixed effect terms in the null space of J(rj). The "estimates" through E u 
and E„ appear far better in comparison, but remain highly unreliable. The 
upward trend of E n and E„ with increasing a is somewhat expected, as 
larger a yields smoother estimates corresponding to larger penalty terms. 
In the right frame of Figure 2, the variance estimates by (5.2) are shown 
in box plots for V fits with a = 1, 1.2, 1.4, 1.6, 1.8, demonstrating generally 
adequate performance. 

5.2. Latent random effects. For latent random effects, we keep the set- 
ting of (5.1) but replace b Si by b Ci , as in Example 3.3. One hundred repli- 
cates of samples were generated with T](xi) and £j as in Section 5.1, and 
with a E {1,2}, 50 each, &i ~ JV(0,of) for a\ = 0.5 2 , and b 2 ~ N(0,a%) for 
a\ = 0.3 2 ; the intracenter correlations are 0.25/(0.25 + 0.25) = 0.5 for c = 1 
and 0.09/(0.09 + 0.25) = 0.265 for c = 2. Cubic smoothing splines were cal- 
culated with A and E minimizing C/~(A,E), V(A,S) and L^{\, E) of (4.6). 

The simulation results are summarized in Figures 3 and 4. Figure 3 par- 
allels Figure 1, except that Li(A, E) is replaced by L2(A,E). The left and 
center frames of Figure 4 summarize the "estimation" of the two parameters 
of E; note that the data contain only one "sample" from iV(0, af) and one 
from iV(0,of). 




0.00 0.04 0.08 0.00 0.04 0.08 1.0 1.2 1,4 1.6 1.8 

i-l(^m.2m) L 1 (A, rn ,2 m ) a 



Fig. 1. Simulation with real random effects. Left and center: Performances of 
U a (X,^) and V a (A,E) with a — 1 (faded circles) and a = 1.4 (circles). Right: 
Li(A m ,S m )/Li(A u ,E u ) (fatter boxes) and Li(X m , E m )/Li (A„, E„) (thinner boxes) for 
a = 1,1.2, 1.4, 1.6, 1.8. 
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-7.5 -6.5 -5.5 -4.5 1.0 1.4 1.8 1.0 1.2 1.4 1.6 1.8 



logioO) (<*=1.4) a a 

Fig. 2. Simulation with real random effects. Left: X u (faded circles) and \ v (circles) for 
a= 1,1.4. Center: o 2 /a 2 "estimated" through E m (left thin box), S u (fatter boxes) and 
£„ (thinner boxes). Right: a 2 . The faded horizontal lines in center and right frames mark 
the true values. 



5.3. Mixture random effects. For mixture random effects, we simply add 
together b s of Section 5.1 and b c of Section 5.2, with the ten subjects nested 
under the two clusters, five each. One hundred replicates of samples were 
generated, with the specifications of rj(x), a 2 , a 2 , a\ and a 2 remaining the 
same as in Sections 5.1 and 5.2. Cubic smoothing splines were calculated with 
A and E minimizing U( A, £), V(\,Y>) and Ls(A,S) of (4.10). The counterpart 
of Figures 1 and 3 is shown in Figure 5. The "estimated" variance ratios are 
again highly unreliable, whereas a 2 demonstrates adequate performance, as 
seen in Figures 2 and 4; plots are omitted. 

6. Applications. We now apply the technique to analyze a couple of real 
data sets. 

6.1. Tumor volume. To study the sensitivity of a human prostate tumor 
to androgen deprivation, a preparation of the PC82 prostate cancer cell 
line was implanted under the skin of eight male nude mice. After 46 days, 
measurable tumors appeared on all eight mice; this day is referred to as day 
0. On day 32, all mice were castrated. The tumors were measured roughly 
weekly over a 5-month period, resulting in 16 sets of measurements on the 
eight mice. Further details concerning the data can be found in [6], along 
with some analyses using parametric models. 

We performed a nonparametric analysis of the data using the techniques 
developed. Taking the logarithm of the measured tumor volume as the re- 
sponse Y, the model of Example 3.1 was considered, 

Yi = r){xi) +b Si +ei, 

where s = 1, . . . , 8. The exponential spline as discussed in Example 3.1 was 
used to estimate rj(x), but the generalized cross-validation score was mini- 
mized at 6 = 0, yielding a cubic spline fit. The fitted r/(x) is plotted in Fig- 
ure 6 along with the data. The variance estimates are given by a 2 = 0.1490 
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Fig. 3. Simulation with latent random effects. Left and center: Performances of 
f/ Q (A,E) and V a (A,E) u;ii/i a = 1 (faded circles) and a = 1.4 (circles). Right: 
L2(X m , E m )/L2(A U , E u ) (fatter boxes) and Z/2(A m , E m )/L2(A„, E„) (thinner boxes) for 
a = 1,1.2, 1.4, 1.6, 1.8. 
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Fig. 4. Simulation with latent random effects. Left and center: a 2 / erf and o~ 2 /a 2 "esta- 
mated" through E m (7e/t i/iin &02;,), E u (fatter boxes) and T, v (thinner boxes). Right: a 2 . 
The faded horizontal line marks the true values. 
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Fig. 5. Simulation with mixture random effects. Left and center: Performances 
of f/ a (A,E) and V a (X,T,) with a = 1 (faded circles) and a = 1.4 (circles). Right: 
L3(A m ,E m )/Z/3(A u ,E u ) (fatter boxes) and L 3 (A m , E m )/L 3 (A„, E„) (thinner boxes) for 
a = 1,1-2, 1.4, 1.6, 1.8. 



and a 2 = 0.0928; remember that a 2 is trustworthy but a 2 can be grossly 
misleading, as shown in Section 5. 
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6.2. Treatment of multiple sclerosis. A randomized, double-blind clinical 
trial was conducted to study the treatment of multiple sclerosis by azathio- 
prine (AZ) and methylprednisolone (MP). Patients were assigned randomly 
to three groups: (i) the PP group receiving placebos for both AZ and MP, 
(ii) the AP group receiving real AZ and placebo MP; and (iii) the AM group 
receiving real AZ and MP. The abundance of lymphocytes bearing a protein 
called Fq receptor was measured in the form of the so-called AFCR levels. 
Blood samples were drawn prior to the initiation of therapy, at the initia- 
tion, in weeks 4, 8 and 12, and every 12 weeks thereafter for the remainder 
of the trial. A total of 48 patients were represented in the data, with 17 
on PP, 15 on AP and 16 on AM. There were "missing" values in the sense 
that blood samples were not drawn from all patients at every time point. 
Detailed descriptions of the study can be found in [7] and further references 
therein. A analysis of the data using parametric models was conducted by 
Heitjan [7]. 

We now present a nonparametric analysis of the data using the formula- 
tion of Example 3.2. Following [7], the responses Y{ are taken as the square 
roots of the AFCR measures. The model is of the form 

Yi = T)(xi,Ti) + b s . +6i, 

where the patient identification s is nested under the treatment level r. 
The "missing" values pose no problem for our treatment. The fitted cubic 
splines are plotted in Figure 7 with the data superimposed. The smoothing 
parameter was effectively set to by cross-validation, so the interaction 
771,2(2, t) consists of only parametric terms with the basis (I\ T =j] — l/3)a;, 
j = 1,2; see Example 3.2 for the notation. The variance estimates were given 
by a 1 = 12.81 and a 2 s = 6.624. 




Fig. 6. Cubic spline fits of tumor volume. Left: Tumor volume measurements (dashed 
lines) and their geometric mean (solid line). Right: Fitted rj(x) (solid line), with the ge- 
ometric mean of measurements superimposed (faded line). The castration time is marked 
by the vertical line. 
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7. Proofs. This section collects the proofs of the lemmas and theorems 
of Section 4. The following lemmas govern some of the calculations. 

Lemma 7.1. For Zb ~ N(0, ZBZ T ) and e ~ N(0,a 2 I), independent of 
each other, one has 

\ax[h T Z T CZh] = 2tv(CZBZ T C T ZBZ T ), 

Y&r[e T Ce] = 2a 4 tr{CC T ), 

V&r[b T Z T Ce] = a 2 tr(CC T ZBZ T ). 

The proof of the lemma is straightforward. 

Lemma 7.2. For M = RE + R T (I - Q z ), where E = R T (I - Q Z )R + 
nXQ, one has 

M T P Z M + M T (P Z - Q Z )M 

= M T (I-Q Z )M<I, 
{I - M) T Pz-{I - M) + (/ - M) T {P Z - Q Z ){I - M) 

= (J - M) T {I - Q Z )(I - M) < 4/. 

Proof. It is straightforward to show that M T (I — Q Z )M < I. Now for 
an arbitrary vector x, 

x T (/ - M) T (I - Q Z )(I - M)x 

= x T (I - Q z )x + x T M T (I - Q z )Mx - 2x T (I - Q z )Mx T < 4x T x, 

where the Cauchy-Schwarz inequality is used to bound the cross term. □ 

Also note that B is fixed, thus having bounded eigenvalues, and that 
X T X and XX T share nonzero eigenvalues for all matrices X. 

We are now ready for the proofs of the lemmas and theorems of Section 4. 

Proof of Lemma 4.1. Recall from (4.3), 

Ri (A, E) = -rf{I - A) 2 rj + - tr((J - A) 2 ZBZ T ) + — trA 2 . 
n n n 

Using (2.7), the first term is seen to be of the order 0(X S ), and the third 
term is of order 0(n~ 1 A~ 1 / r + n~ 1 p). Again by (2.7), 

(I - A)Z = (I- A)Z{I - (Z T {I - A)Z + E)- 1 ^/ - A)Z) 

= (I - A)Z(Z T (I - A)Z + E) _1 E, 
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thus 

z T (i - Afz < t,(z t (i - A)z + £) -1 £, 

so Condition C.l implies an upper bound on the eigenvalues of (I — A)ZBZ T (I- 
A), and the second term is of order 0{n~ l p). The proof is complete. □ 

Proof of Theorem 4.1. In light of (4.5), it suffices to show that 

(7.1) Lx(A, £) - i?i(A, £) = OpORi(A, £)), 

(7.2) n" 1 ^ + Zh) T {I - A)e = o P (i?i(A, £)), 

(7.3) n -1 (e T Ae-<r 2 trA) = o p (22 1 (A,S)). 
To see (7.1), note that 

Var[Li(A, £)] = n" 2 Var[2r/ T (I - A) 2 Zb - 2^(7 - A)Ae 

+ b T Z T (I - A) 2 Zh - 2h T Z T (I - A)Ae + s T A 2 e\. 

Since Condition C.l implies an upper bound on the eigenvalues of (I — 
A)ZBZ T (I - A), one has 

rT 2 Var[ry T (/ - ^) 2 Zb] = n~ 2 r) T (I - A) 2 ZBZ T (I - Afr, 

= n~ 1 0(R 1 ) = o(Rl), 

where the last equation is by Condition C.2. Likewise, 

rT 2 Vav[r] T (I - A)Ae] = n~ 2 a 2 r] T (I - A)A 2 (I - A)r) = o(Rf), 
tT 2 Var[b T Z T (J - AfZh] = 2n~ 2 tr((I - A) 2 ZBZ T {I - A) 2 ZBZ T ) = o{R 2 x ), 
tT 2 \av[b T Z T (I - A)Ae] = n~ 2 a 2 tr((/ - A)A 2 {I - A)ZBZ T ) = o(i? 2 ), 
n- 2 Var[eA 2 e] = 2n"Vtr,4 4 = o{R\). 

Summing up, and bounding the covariances between the terms by the Cauchy- 
Schwarz inequality, one has Var[Li(A, £)] = o(i? 2 (A, £)), and hence (7.1). 
Similar calculations yield (7.2) and (7.3), completing the proof. □ 

Proof of Lemma 4.2. From (2.7), one has tr A < tr A + p and tr A 2 > 
tiA 2 , so 

(n _1 tr^4) 2 ^ (n~ l tT A + n~ l p) 2 _i,_i/ r _i 2 xi/r\ 

V _ lf ,2 ^ - F; =Q(n X A Vr + n i p + n y A l/r 

The lemma follows as A — > and jiA 1 / 1 " — > oo. □ 
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Proof of Lemma 4.3. Recall from (4.9) that 
i? 1 (A,S)-i? 2 (A,S) 

= ^rj T {I - M) T (P Z - Qzf{I - M)r, 

+ - tr(((P z - Qz) + (Pz - Qz)RE + R T {P z - Qz)?ZBZ T ) 
n 

2 

+ — tr((Q z + (Pz - Qz)M) T {Q z + (P z - Qz)M)). 
n 

Since D = Z T Z + E < (1 + p n )Z T Z, one has - < PnPz/(l + Pn) < 
p n Pz- For the first line, noting that Pz — Qz = Pz(I — Qz) and (J — Qz)(I — 
M) = I - A, one has 

ir7 T (/ - M) T (P Z - Q z ) 2 (/ - M)rj = - V T (I - A)P Z {I - A)rj = o(Pi). 
n n 

Alternatively, with rj T rj/n bounded, 

- V T (I - MfiPz - Qz?(I - M)v 
n 

< Pn -V T (I ~ Mf(P z - Qz)(I - M) V 
n 

= 0( Pn ) 

as (J - M) T (P Z - Qz){I ~ M) < 47. For the second line, note that 

- tr((P z - Qz)ZBZ T {P z - Qz)) < p 2 J-tT{ZBZ T ) = o(Pi) 
n n 

and, with F = (P Z - Qz) xl2 RE+R T (P z - Qz) 112 < I and hence F(P Z - 
Qz)F < Pnl, that 

-tr(((P z - Qz)RE + R T (P z - Qz)fZBZ T ) 
n 

= - tr(B 1 / 2 Z T (P z - Qz) 1/2 F(Pz ~ Qz)F(Pz - Qzf^ZB 1 ' 2 ) 
n 

< p n -ti(B 1 / 2 Z T (P z - Qz)ZB 1 / 2 ) = p 2 n -tr(ZBZ T ) = o(12i); 
n n 

the cross term can be bounded by the Cauchy-Schwarz inequality. For the 

third line, note that n~ l trQ| < p/n = o(Ri) and that M T (P Z -Q Z ) 2 M < I 

has no more than p nonzero eigenvalues. The proof is now complete. □ 



Proof of Theorem 4.3. Recall from (4.7) that 

- M) T P^(I - M)rj 
+ tr(M T P^MZBZ T ) + a 2 tr(M T P^M)}. 



P 2 (A, E) = -{rj T (I - M) T Pz-(I - M)rj 
n 
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Plugging r) = M(rj + Zh + e) into (4.8) and grouping terms, some algebra 
leads to 

E/(A,E) -L 2 (A,£) - -e T e 
n 

= -(rj + Zh) T (I - M) T (P Z - Q Z ) 2 (I - M)(rj + Zh) 
n 

+ -t/ t (J - M) T (P Z - Q Z ) 2 (I - M)s + -if (I - M) T P^e 
n n 

(? - 4) 2 2 

+ -b T Z T (I - M) T (P Z - Qz?(I - M)e - -b T Z T M T P^e 

n n 

+ -e T {Qz + (Pz ~ Qz)M) T (Q z + (P z - Qz)M)e 
n 

- - {e T Ae - a 2 trA). 

n 

To prove the first part of the theorem, it suffices to show that (7.4) is of 
order o p (i?2(A, £)) and that 

(7.5) L 2 (A, S) - R 2 (X, S) = o p {R 2 {\, £)). 

Taking the expectation of the first line of (7.4), one has 

-E[( V + Zh) T (I - M) T (P Z - Qz?(I -M)(rj + Zh)] 

n 

= -r)(I - M) T (P Z - Qzf(I - M) V 
n 

+ - tv(((P z - Qz) - {Pz - Qz)RE + R T (P z - Q Z )?ZBZ T ) 
n 

= 0(R 1 -R 2 ) = o(R 2 ), 

where Condition C.4 is used. Similarly, the expectation of the fourth line of 
(7.4) gives 

-E[e T (Q z + (Pz ~ Qz)M) T (Q z + {Pz - Qz)M)e] 
n 

2 

= — tx((Q z + (P z - Qz)M) T (Q z + (Pz - Qz)M)) 
n 

= 0(R l - R 2 ) = o(R 2 ). 

For the two terms on the second line of (7.4), noting that (/ — Mf (Pz — 
Qz) 2 (I-M)<4I, 

n~ 2 Var[77 T (I - M) T (P Z - Q Z ) 2 (I - M)e] 

< 4n- 2 a 2 ri T (I - M) T (P Z - Q Z ) 2 (I - M)rj = o(i^) 
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by Conditions C.2 and C.4, and 

n- 2 Vav[r) T (I - M) T P^e] = n~ 2 o 2 r] T (J - M) T P^(I - M)r] 

= n~ l O{R 2 ) = o{R 2 2 ). 
Likewise, the third-line terms in (7.4) give 

n~ 2 Var[b T Z T (I - M) T (P Z - Qz?{I - M)e] 

< 2n" V tr(((P z - Qz) - (P Z ~ Qz)RE + R T {P z - Qz)) 2 ZBZ T ) 
= o(R 2 ), 

and 

n^ 2 Var[b T Z T M T P^e}=2n~ 2 a 2 tr(M T P z -MZBZ T )=n~ 1 0(R 2 ) = o(Rl). 

The fifth line of (7.4) is (7.3), which is of order o p (R\) = o p (i?2) by Condi- 
tion C.4. To see (7.5), note that 

Var[L 2 (A,S)] 

= n~ 2 Var[2rj T (M - ifP^MZb + 2r] T (M - ifP^Me 

+ b T Z T M T P^MZb + b T Z T M T P^Me + e T M T P^Me\. 

Using (2.6), one has MZ = AZ(Z T (I - A)Z + S) _1 S, P^MZ = P^(A - 
I)Z(Z T (I - A)Z + S)- 1 ^, thus Z T M T P^MZ < T,{Z T {I - A)Z + S^E, 
so Condition C.l implies bounded eigenvalues for P^MZBZ T M T P^. It 
then follows that 

n~ 2 Var[r] T (M - I) T P^MZb\ = n~ 2 rj T (I - M) T P^MZBZ T M T P^(l - M)r) 

= o(R 2 ), 

n- 2 Vav[r] T (M - I) T P^Me] = n~ 2 a 2 r] T (I - M) T P^MM T P^(J - M)rj 

= o(R 2 ), 

n- 2 Var[b T Z T M T P^MZb] = 2n~ 2 tv(M T P^M ZBZ T M T P^M ZBZ T ) 

= o(R 2 ), 

n- 2 Vav[b T Z T M T P z -Ms] = n- 2 a 2 tx(M T P^M ZBZ T M T P^M) = o(Rl), 

n~ 2 Yav[e T M T P^Me\ = 2n~ 2 a i tr(M T Pjjr M M T Pjjr M) = o(i2|). 

Collecting terms and bounding the covariances between the terms by the 
Cauchy— Schwarz inequality, one hasVar[L 2 (A,S)] =o(i2|(A,£)), and hence (7.5). 
The proof of the first part of the theorem is now complete. 

Given the first part of the theorem, the second part follows from the proof 
of Theorem 3.3 in [2], page 66. □ 
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8. Discussion. In this article we studied the optimal smoothing of non- 
parametric mixed-effect models through generalized cross-validation. The 
asymptotic analysis was backed by simulation studies with sample size as 
small as 100. Related practical issues such as variance estimation were also 
explored in the simulation studies. As a sequel to this work, the optimal 
smoothing of non-Gaussian longitudinal data has been studied in [4] on an 
empirical basis. The methods have been implemented in the open-source R 
package gss by the first author. 

While many correlated errors can be cast as variance components with 
low-rank random effects, some others do not conform, which spells the lim- 
itation of the techniques developed here; an important nonconforming case 
is serial or spatial correlation. On the flip side, the nonparametric rj(x) can 
be interpreted as a realization of a Gaussian process under the Bayes model 
of a smoothing spline, so there remains a potential identifiability problem of 
some sort between rj(x) and a separate serial or spatial correlation, unless 
the serial or spatial correlation is independent of x. Optimal smoothing for 
penalized likelihood estimation with serially or spatially correlated data is 
treated in a recent study by Gu and Han [3] . 
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